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ABSTRACT 



This paper summarizes research on evaluating the 
distributional consequences of social programs, presenting the evaluation 
problem for an economy with two sectors (e.g., schooled and unschooled) where 
agents select or are selected into treatment (one of the two sectors) . It 
considers policies affecting choices of treatment but not potential outcomes 
(the outcomes they experience under different treatments) and compares 
outcomes across two policy regimes that affect treatment choices. This task 
is easier when individuals respond in the same way to treatment than when 
they differ in their response and act on those differences in making 
treatment choice decisions. In the latter case, the marginal entrant into 
schooling is not the same as the average participant in treatment, and the 
representative agent paradigm breaks down. The paper estimates the 
distributional consequences of two proposed policy reforms in U.S. education. 
Even though they barely affect the overall distribution of outcomes, and 
would be judged equivalent to the pre-policy origin state under the Veil of 
Ignorance criterion, they have substantial effects on a small group of people 
concentrated in the middle to high end of the pre-policy wage distribution. 

An appendix shows how to generate the counterf actual distributions of 
outcomes produced by alternative policies. (Contains 23 references.) (SM) 
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1 Introduction 



Evaluating public policy is a central task of economics. Welfare economics 
presents different criteria. Research on program evaluation develops and applies 
a variety of different econometric estimators. Traditional empirical methods fo- 
cus on mean impacts. Yet modern welfare economics emphasizes the importance 
of accounting for the impact of public policy on distributions of outcomes (Sen, 
1997, 2000). A large body of empirical evidence indicates that people differ in 
their responses to the same policy and act on those differences, and that the 
representative agent paradigm is a poor approximation to reality because the 
marginal entrant into a social program is often different from the average par- 
ticipant. (Heckman, 2001a). This evidence highlights the importance of going 
beyond the representative agent framework when evaluating public policies. 

This paper summarizes our recent research on evaluating the distributional 
consequences of public policy. 1 Our research advances the economic policy eval- 
uation literature beyond estimating assorted mean impacts to estimate the dis- 
tributions of outcomes generated by different policies and to determine how 
those policies shift persons across the distributions of potential outcomes pro- 
duced by them. We distinguish the average participant in a program from the 
marginal entrant. 

Our research advances the existing literature on evaluating the distributional 
consequences of alternative policies beyond the “Veil of Ignorance” assumption 
used in modern welfare economics (See Atkinson 1970, Sen 1997, 2000). Ap- 
1 Carneiro, Hansen and Heckman (2000, revised 2001). 
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proaches based on that assumption compare two social states by assuming that 
the position of any particular individual in one distribution should be treated 
as irrelevant. In this approach the overall distribution of outcomes is all that 
matters. This is a consequence of the anonymity postulate that is fundamen- 
tal to that literature. Anonymity is the property that only the distribution of 
outcomes matters and that reversing the positions of any two persons in the 
overall distribution does not affect the evaluation placed on the policy (or state 
of affairs) that produces the distribution. 

There are normative arguments that support this criterion. (See Harsanyi, 
1955, Vickery, 1960, and Roemer, 1996). As a positive description of actual 
social choice processes, the “Veil of Ignorance” seems implausible. Participants 
in the political process are likely to forecast their outcomes under alternative 
economic policies, and assess policies in this light. (Heckman, 2001b). This 
paper extends current practice by developing and applying methods that fore- 
cast how people fare under different policies. We link the literature in modern 
welfare economics to the treatment effect literature. 

This paper proceeds as follows. We briefly present the evaluation problem for 
an economy with two sectors (e.g. schooled and unschooled) where agents select 
or are selected into “treatment” (one of the two sectors). We consider policies 
that affect choices of treatment (e.g. schooling) but not potential outcomes (the 
outcomes they experience under different treatments). We compare outcomes 
across two policy regimes that affect treatment choices. This task is much easier 
when individuals respond in the same way to treatment than when they differ in 
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their response to treatment, and act on those differences in making treatment 
choice decisions. In the latter case, the marginal entrant into schooling is 
not the same as the average participant in treatment and the representative 
agent paradigm breaks down. In an appendix, we show how to generate the 
counterfactual distributions of outcomes produced by alternative policies. 

We apply our analysis to estimate the distributional consequences of two 
proposed policy reforms in American education. Even though the two policies 
barely affect the overall distribution of outcomes, and so would be judged to be 
equivalent to the pre-policy origin state under the Veil of Ignorance criterion, 
they have substantial effects on a small group of people concentrated in the 
middle to the high end of the pre-policy wage distribution. Marginal entrants 
attracted into college get smaller gains than average college students suggesting 
diminishing returns to programs that encourage college enrollment. Marginal 
entrants into junior college are about the same as average entrants, suggesting 
constant returns for that schooling level. Since most of the people affected by the 
policies come from the middle to the high end of the original wage distribution, 
there is little impact of these policies on the poor. 

2 The Evaluation Problem for Means and Dis- 
tributions 

In order to place our work in the context of the current literature on social 
program evaluation, and to link it to the economics of education, it is helpful to 
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consider a simple generalized Roy (1951) economy with two sectors. Let 5 = 1 
denote college and 5 = 0 be high school. Persons (or their agents, such as their 
parents) can choose to be in either sector. There are two potential outcomes 
for each person (Vq? ^i), only one of which is observed, since it is assumed that 
only one option can be pursued at any time. For simplicity, we assume that the 
decision rule governing sectoral choices is 



1 if I = Yi - Y 0 - C > 0, 



0 otherwise. 



Here C is the cost of choosing 5 = 1. In the context of a schooling model, C is 
tuition or monetized psychic cost, while Y\ — Yq is the net gain from schooling 
expressed, say, in present value terms. 

We decompose Y\ and Yq in terms of their means fx 1 and and mean zero 
idiosyncratic deviations (Ui,Uq) or residuals: 



Yq — fi 0 + Uq. 



We condition on X variables, but for notational simplicity we keep this depen- 
dence implicit. Decomposing C in a similar fashion, we may write: 



C — fi c + Uc , 
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so that 



I = Pi + /z 0 - \L C + {Ui - Uo - Uc ). 

It is fruitful to distinguish two kinds of policies: (a) those that affect poten- 
tial outcomes (Vo* Vi) through price and quality effects and (b) those that affect 
sectoral choices (through C) but do not affect potential outcomes. Tuition and 
access policies that do not have general equilibrium effects fall into the second 
category of policy. Policies with general equilibrium effects and policies that 
directly affect rewards to potential outcomes and quality are examples of the 
first kind of policy. It is the second kind of policy that receives the most at- 
tention in empirical work on estimating economic returns to schooling (see e.g., 
the survey by Card (1999)) or in evaluating schooling policies (see e.g., Kane 
(1994)). 

Consider two policy environments denoted A and B . These produce two 
social states for outcomes that we wish to compare. In the general case, we may 
distinguish an economy operating under policy A with associated cost and out- 
come vector (Y£,Yf,C A ) for each person, from an economy operating under 
policy B with associated cost and outcome vector (Y^, Y B > C B ). Policy inter- 
ventions with no effect on potential outcomes can be described as producing two 
choice sets (Yo, Yi , C A ) and (Yo, Yi , C B ) for each person. In this paper we focus 
on evaluating the second kind of policy that keeps invariant the distribution of 
potential outcomes across policy states, but affects the cost of choosing sector 
1 within each state. 

Our framework differs in its emphasis from the standard model of modern 
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welfare economics. Analysts writing in that tradition focus on the distribution 
of outcomes produced by each policy without inquiring how those outcomes are 
produced. All policies that produce the same aggregate outcome distributions 
are judged to be equally good. The details of how the observed distribution 
is produced are deemed irrelevant. The distinctions we make between policies 
that affect potential outcomes and policies that affect which potential outcomes 
are selected are also ignored in that literature. There is no explicit discussion 
of sectoral choice within policy states. The literature starts, and stops, with an 
analysis of distributions of the observed outcomes for each person in each policy 
state (Y a ,Y b ) defined as 



Y a = S a Y 1 a + (1-S a )Y 0 a , and Y B = S B Y B + (1 - S B )Y 0 B , 

where 5,4 and 5s are schooling choice indicators under policies A and B respec- 
tively, without inquiring more deeply into the sources of the differences in the 
distributions of outcomes. 

The modern treatment effect literature focuses on these details and distin- 
guishes choice of treatments from the treatment outcomes. However, it only 
inquires about certain mean treatment effects. The operating assumption in 
the literature is that policies do not affect potential outcomes (so (Yq^ ,Y A ) = 
(Y^ ,Y B )), but do affect choices of sectors. 

This literature distinguishes three cases. Case I arises when everyone (with 
the same X) gets the same effect from treatment {Y\ — Yq is the same for 
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everyone). Case II occurs when Y\ — Yq differs among people of the same X 
but decisions to enroll in the program are not affected by these differences: 

Pr(S = 1| Y x - Y 0 ) = Pr (5 = 1) 

Case III occurs when Y\ — Yq differs among people and people act on these 
differences. In cases I and II, the marginal entrant into a program is the same 
as the average entrant. In case III, this is not so. People select in part on gains. 
If they select solely on gains, then the marginal entrant gets a lower return than 
those participants (in 1) who are inframarginal; that is, the marginal treatment 
effect (MTE) 



E(Y X - Y 0 \I = 0) < E(Y X - Y 0 \S = 1) 



See Heckman 2001a for more discussion of the various treatment effects. 2 



3 Comparing Two Policy States 

Consider two policies, A and B, that affect sectoral choices without af- 
fecting the distributions of potential outcomes. For concreteness, we can think 
of these as policies that affect C (e.g., tuition or access) by shifting its mean, 
changing its variance or changing the covariance between C and (Yo> Y x ). Each 
policy produces a distribution of outcomes. For concreteness, think of the 

2 Bjorklund and Moffitt (1987) introduced the marginal treatment effect into the evaluation 
literature. See Heckman (2001a) for a summary of extensions of this literature. 
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outcome as wages associated with different schooling levels. 

In the literature on evaluating inequality, comparisons of policies are made 
in terms of comparisons of distributions. If policy B produces an aggregate 
distribution of wages that stochastically dominates that produced from policy 
A, B is preferred. 3 The details of who benefits or loses from the policy are 
considered to be irrelevant as a consequence of the anonymity postulate. 

The literature on evaluating inequality in modern welfare economics com- 
pares two aggregate outcome distributions. If policy A has been implemented, 
but policy B has not, evaluation of B entails construction of a counterfactual 
aggregate outcome distribution. Under the assumptions used in the treatment 
effect literature, all that is required is determination of how policy B sorts per- 
sons into sectors “0” and “1”, and how such sorting affects observed outcome 
distributions in sectors “0” and “1”. In our example, what is required is a 
schooling choice equation and a selection model to identify the invariant po- 
tential outcome distributions. The selection model enables analysts to go from 
observed (selected) distributions of Yq and Y\ to the population potential distri- 
butions. With sufficient individual variation in C within an economy governed 
by policy A, it is possible to accurately forecast the effect of policy B on the 
overall distribution without previously observing it, as we demonstrate in this 
paper. 

Our approach to the evaluation of public policy is more ambitious in some 
respects than the recent literature in welfare economics and is more in line with 

3 See Sen (1997). 
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the objectives of modern political economy. (Persson and Tabellini, 2000). 
We relax the anonymity postulate and determine how individuals at different 
positions within the initial overall distribution respond to policies in terms of 
their treatment choices and gains. We estimate the number of people directly 
affected by the policy, where they start, and where they end up in the overall 
distribution. 

In the context of the treatment effect framework, this task is broken down 
into two sub-tasks. The first sub-task is to determine who shifts treatment 
state in response to the policy and where they are located in the initial overall 
distribution. The second sub-task is to determine where they end up in the 
overall distribution after taking the treatment, and how much they gain. Since 
this approach assumes that potential outcome distributions are not affected by 
the policies, it is less ambitious, in this respect, than the approach advocated 
in modern welfare economics which entertains that possibility. 

Under case I, this task is greatly simplified. Everyone who shifts from u 0” 
to “1” gets the same gain A. The only problem is to find where in the initial 
overall distribution the switchers are located. Under case II, A varies among 
observationally identical people. The gain is not necessarily the same for persons 
with different initial Yq values. However, on average , across all movers, the 
gain is the same as the mean difference between the two potential outcome 
distributions within policy regime A. Hence the marginal entrant has the same 
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mean as the average person and the average participant: 



E(Y 1 - Y 0 \I = 0) = E(Y 1 - Y 0 ) = E{Y 1 - Y 0 |S = 1). 



Case III differs from case II in that in general the gains to the average 
switcher are not the same as the gains to the previous participants. If (Yi — Vo) 
is positively correlated with I = (Yi— Y q— C), the marginal entrant receives lower 
gains on average than does the average participant. The details of constructing 
the transition densities for the switchers are presented in our companion paper. 

4 Identifying Counterfactual Distributions Un- 
der Treatment Effect Assumptions 

Identifying the joint distribution of potential outcomes under treatment 
effect assumptions is more difficult than identifying the various mean treatment 
effects, 4 The fundamental problem is that we never observe both components 
of (Y 0 ,Yi) for anyone, 0 Thus we cannot directly form the joint distribution of 
potential outcomes (Yq, Yi). 

In the Appendix, we review various approaches to estimating, or bounding, 
counterfactual distributions that have appeared in the literature. In our source 
paper, we develop a new method for identifying these distributions. It is based 

1 A large econometric literature identifies the mean impacts under a variety of assumptions. 
See Heckman, LaLonde and Smith (1999) for one survey. Heckman and Vytlacil (2000, 2001) 
consider identification of marginal treatment effects and unify the treatment effect literature. 

5 Panel data estimators sometimes enable analysts to observe both components. See Heck- 
man, LaLonde and Smith (1999). 
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on an idea common in factor analysis but applied to model count erfactual dis- 
tributions. If potential outcomes are generated by a low dimensional set of 
factors, then it is possible to estimate the distributions of factors and generate 
distributions of the counter fact uals. Here, low dimensional refers to the number 
of factors relative to the number of measured outcomes. See the Appendix for 
the intuitive idea that motivates the analysis in our source paper. We next 
turn to an application of our analysis to American data. 



5 Some Evidence From America on Two Edu- 
cational Reforms 



Our companion paper uses data on wages, schooling choices and covariates 
for white males from the National Longitudinal Study of Youth (NLSY) to 
estimate a three factor version of the model described in the Appendix using 
a Bayesian semiparametric mixture of normals econometric framework. We 
consider four schooling levels: dropout, high school graduate, junior college 
and four year college. We use local labor market variables, tuition and family 
background information to identify the model. The estimated model fits the 
data well. Observed wage distributions are closely approximated. There is no 
need for more than three factors to fit our data which includes panel data 
measurements on wages as well as indicators of ability and motivation . 6 

Our paper estimates models for a variety of schooling groups. Here, for the 

6 The factor model is strongly overidentified so that it would have been possible to estimate 
many more than three factors. 
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sake of brevity, we focus only on certain key empirical results. We report the 
wages returns to college and high school, and selection on levels and gains into 
those schooling categories. We analyze two policies: (a) a full tuition subsidy 
for junior colleges and (b) a policy promoting access to four year colleges which 
places an institution in the immediate vicinity (the county of residence) of each 
American. We consider only partial equilibrium treatment effects and do not 
consider the full cost of financing the reforms. 

Our evidence shows considerable dispersion in terms of levels and returns 
(gains) to various schooling categories. Indeed, ex post returns are negative for 
a substantial fraction of people. There is little evidence of selection either on 
levels or gains for high school graduates. There is a lot of evidence of selection 
on levels and gains for college graduates. The marginal entrants into four year 
colleges induced by the access policy we consider have wage outcomes below the 
average college participant both in terms of levels and gains. This is not true 
for the junior college tuition subsidy policy we also analyze. For that case, there 
is little impact on overall quality of junior college graduates. 

Figure 1 shows the potential high school wages for all four schooling groups- 
what people who actually attend various schooling levels would have earned had 
they gone to high school. The four densities are nearly the same suggesting that 
there is little evidence of selection on levels into high school. Three of these four 
densities are counterfactual. The density for high school graduates is factual. 
For college (Figure 2), there is strong evidence of selection on levels. Persons 
who attend college do better in college than dropouts, high school graduates 
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or junior college graduates would do. This result contrasts sharply with the 
corresponding result for the factual and counter factual wage densities for high 
school graduates. 

There is also little evidence of selection on gains (Yj — Yq) to high school 
(high school vs. dropout). See Figure 3 which plots the counterfactual returns 
to high school for all four schooling groups. The returns (high school vs. four 
year college) are greater for persons who become college graduates than for the 
other schooling groups, although there is a lot of overlap in the distributions. 
See Figure 4. Ex post many persons who actually stop their schooling at the, 
high school level would make fine college graduates. Many college graduates ex- 
perience negative returns. The marginal treatment effect comparing high school 
to college (Figure 6) suggests that as the unobservables that lead to a higher 
likelihood of attending college increase, (so P(S= 1) increases) the return to col- 
lege increases. People most likely to attend college have the highest marginal 
returns. The corresponding figure for the return to high school is flat, suggesting 
that the marginal participant has the same return as the average participant. 

Using the estimated model, we compare two policies: a full subsidy to com- 
munity college tuition and a policy that places a four year college in each county 
in America. Table 1 shows the average log wages of participants before the pol- 
icy change and their average return. It compares these levels and returns with 
what the marginal participant attracted into the indicated schooling by the pol- 
icy would earn. Marginal and average log wages and returns are about the 
same for the community college policy. There is little decline in quality among 
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the entrants. For the access policy, there is a sharp difference. Average par- 
ticipants in four year colleges earn more and have higher returns than marginal 
entrants. There is a sharp decline in the average quality of college graduates. 

Despite the substantial sizes of the policy changes we consider, the induced 
effects on participation are small. The four year access policy only raises four 
year graduation rates by 1.3 percent. The junior college subsidy raises atten- 
dance at those institutions by 3.8 percent. 

The policies operate unevenly over the deciles of the initial outcome distri- 
bution. Mobility is greatest at the center of the distribution for the community 
college policy. See Table 2 and Figure 6. Mobility is from the top of the initial 
wage distribution for the four year college policy. See Table 3 and Figure 7. 
Neither policy benefits the poor. 

Our approach to the evaluation of social policy is much richer, and more 
informative, than an analysis of aggregate outcomes of the sort contemplated 
in modern welfare theory. The overall Gini coefficient does not change (to two 
decimal points) when we implement the two policies. By the standards of that 
literature, the pre- and post-policy distributions are the same. A focus on the 
aggregate outcome distribution masks important details which our approach 
reveals. Only a small group of persons are directly affected by the policy. The 
vast majority of persons would be unaffected by these policies, and presumably, 
would be indifferent to the policy. 7 Our approach to policy evaluation lifts the 
Veil of Ignorance and provides a more complete interpretation of who benefits 

7 Counting their tax burden, they might even be hostile to these policies. 
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from the policy and where beneficiaries come from in the overall distribution of 



outcomes. 



6 Summary and Conclusions 

This paper summarizes our recent research on evaluating the distributional 
consequences of social programs. We move beyond the mean treatment effects 
that dominate discussion in the recent applied evaluation literature to analyze 
the impacts of policy on distributions of outcomes. We develop and apply 
methods for determining which persons are affected by the policy, where they 
come from in the initial distribution, and what their gains are. 

We contrast the outcomes of participants in schooling before the policy 
change with the outcomes of marginal entrants induced into the treatment state 
by the policy. We compare our approach to the approach advocated in mod- 
ern welfare economics. That approach focuses attention solely on the aggregate 
distribution and does not identify gainers and losers from a policy. Our ap- 
proach identifies where gainers and losers are located in the overall distribution. 
The output produced from our approach generates the information required in 
positive political economy. 

Our analysis has been conducted for a partial equilibrium treatment effect 
model that assumes that policies do not affect the distribution of potential 
outcomes, just the choice probabilities of particular treatments. It would be de- 
sirable to extend our framework to analyze the effects of more general policies 
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that affect both outcome distributions and choices using the general equilib- 
rium framework described in Heckman (2001b). We leave that task for another 
occasion. 
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Appendix 

Identifying Counter factual Distributions Under Treatment Effect 

Assumptions 

Heckman and Honors (1990) show that in the context of the original Roy 
(1951) model under normality or exclusion restrictions, it is possible to identify 
the joint density of potential outcomes. The original Roy model sets C = 
0. Sectoral choices are then determined solely by potential outcomes. This 
extra information identifies the full model and lets analysts identify the joint 
distributions of outcomes across policy states. If there is variation in C across 
persons, this method breaks down and it is only possible to identify p(Yo|S = 0) 
and g(Yi\S = 1), the conditional densities of the potential outcomes, as well 
as Pr(S = 1), but not the joint density, p(Yo,Yi) (Heckman (1990)). Another 
special case that is discussed in Heckman (1992), is case I where Yj = Yq + A, 
and A is a constant. Then from the marginal distribution of Yq or Y\ it is 
possible to form the joint distribution (Yoj^i) which is degenerate. Heckman 
and Smith (1993) and Heckman, Smith and Clements (1997) generalize this case 
to assume that the persons at the q th percentile in the density of Yq are at the q th 
percentile of Y\. Even without imposing this information, from the marginals it 
is also possible to bound the joint densities using classical results in probability 
theory. In practice these bounds turn out to be rather wide (Heckman and 
Smith (1993); Heckman, Smith and Clements (1997)). 

21 




23 



In our source paper (Carneiro, Hansen and Heckman (2001)), we generate the 
distributions of potential outcomes using a panel data factor structure model. 
For the details of our method we refer the reader to our source paper. Here we 
present the intuitive idea that underlies our method and in the text we report its 
application. We discuss the most elementary case, leaving a complete discussion 
of the more general case for our companion paper. 

Suppose that the mean of C depends on shifter variables Z that do not 
affect (are independent of) potential outcomes (Yo,Yi). These are instruments. 
Suppose that for some values of Z within available samples we observe 

Pr(S = \\Z) = 1 ZeZi 



while for other values of Z 



Pr(S = \\Z) = 0 

Thus if Z is tuition, people who face a low tuition cost (possibly even a large 
subsidy) are almost surely likely to go to college while those who face a very 
high tuition cost are almost certainly likely not to go to school. 8 We assume 
that the distribution of potential outcomes is the same in these subsets as they 
are in the overall distribution. Thus we can identify the marginal distribution 
of Y\ from the first sample and the marginal distribution of Y 0 from the second 
sample. 

8 This is the version of identification at infinity discussed in Heckman (1990). 
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Within these samples, we observe post schooling outcomes 



You 


II 

1 — i 


for Z £ Zq 


Yu, 


II 

1 — i 


for Z £ Z\ 



From these data we can form the joint densities of each outcome over time on 
f iy 01 , > > > 5 VoT ) and /(j/ii, ..., 2 /it)j but not the joint densities over time over both 
outcomes. 

Now suppose that Yot and Y\t are both generated by a common factor / 
(e.g., ability, motivation) so that 



Yot = Mot + a otf + *ot, 


i = 1, . . . ,T, 


Yit = Mi t + otitf + fit. 





where the Cot and e\t are mutually independent of each other, /, and all other 
Cot' >£<»">* 7 ^ t'yt"? All of these error components are assumed to have mean 
zero. A common factor generates both potential outcomes. If we can get 
our hands on the distribution of the common factor, we can compute the joint 
distribution of counterfactuals up to some signs for the covariances. 

9 The means may depend on the covariates. 
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Within each regime we can compute the following covariances: 



Cov(Y ot ,Yot') = aotaot’CTf, t ^ = 1 , . . . ,T, for Z € Z 0 , 

Cov(y u , Y u t) = aitawa}, t for Z € Z x 

For concreteness suppose T = 3, so we have three panel wage observations. 
Then 

Cov(y 0 i, *02) = 0010020’/ , 

Cov(y 0 i, y 03 ) = O01O03O/, for Z E z 0) 

Cov(y 0 i, y 02 ) = 002 0030 -} , 

and 

Cov(y ll5 Y\ 2) = anoturf, 

Cov(y n ,y 13 ) =anai3cr^j for Z E -Zi, 

Cov(yi 2 , V 13 ) = 012^13^/- 

If we assume aoi = 1 or <7^ = 1 , we can identify all of the rest of the factor 
loadings. The proof is straightforward: 

Cov(yoi,y 0 2 ) _ QQ 2 
Cov(y 0 i,y 0 3 ) ao 3 

Given cr^ = 1, we can use 



Cov(y 02 , y 03 ) = «02«03* 



to obtain 
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( 2 Cov(y 02 ,y 03 ) 

(a ° 3) Cov(Y 01 ,Y 02 )Cov(Y 02 ,Y 03 ) 



and we can identify ao3 up to sign and hence can identify ao2 and aoi- If we 
normalize aoi = 1 , we can identify a 02 > «03 an d <Ty. Using the data on Yi, 
under either normalization, we can identify an, ai2, 0:13 up to sign since crj 
is known. Since the sign of / is unknown, the sign of the factor loadings is 
unknown. 

With this information in hand, we can identify the variances of the unique- 
nesses, €Qt,eit of the outcomes: 

Var(eot) = Var(Vot) — oc 2 t a 2 t = 

Var(eu) = Var(yi t ) - a u <r 2 f t = l,...,T 

Suppose that /,6ot, £i2t,t = 1 are normally distributed. Then from 

the information just presented obtained from the subsamples associated with 
Zq and Z\ we can identify the density of / and hence the joint density of 
(Y01, Yn, ... , Yot, Vlr). Using the outcome data within schooling choices we 
can identify the distribution of / and hence estimate the joint distribution of 
schooling choices across potential outcomes provided that we fix the sign of the 
factor loadings. 

In our companion paper we present two methods for resolving the ambigu- 
ity regarding the sign of the covariances. The first method explicitly models 
the choice process and uses the covariance between choices and outcomes to 
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pin down the sign of the factor loadings and the covariances of the potential 
outcomes across schooling levels. The second method uses an indicator of the 
factor (e.g., an ability test) to resolve the sign problem. Both approaches rely 
on the same basic idea of using the covariances of Y u and Yot with a common 
third variable to identify the sign of the factor loading. The second approach is 
easier to motivate and we do so here. 

Suppose that we have access to one ability test for each person. Measured 
ability is 



A — Ha(X) 4 - Pf 4 - £a 

where (i A {X) is the mean of ability, X are the covariates predicting ability, and 
ea is mutually independent of (eoi, •••, £o r> £io> -"5 £1 r ) and the /. 

We can compute 



Cov(A, Y ot ) = Paottf t = l,...,T 

for persons who do not attend school (e.g., do not attend college) and 

Cov(A, Yu') = & au'aj t' = 1, ...,T 

for persons who attend school (e.g., college). Assuming 0^0 and aot, ^ 0 
we can identify 
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t = 1 , T , t f = 1 and t ^ t f 



Cov{A,Yu*) _ an/ 

Cov(i4, Yot) a 0 t 

and hence we fix the sign of Cov(Yo t , Yu*) for all t and £'. This resolves the 
ambiguity regarding the sign of the covariances. 

In our companion paper we show that we can obtain this joint density with- 
out a normality assumption for / or £ot, £u, t = 1, T. We extend our analysis 
to allow for vector / so there may be many factors, not just one. We show that it 
is possible to nonparametrically identify the joint density of potential outcomes 
provided that the number of panel data wage measurements is large, in a sense 
we make precise in our companion paper, relative to the number of factors. 10 
We do not need to invoke ‘‘identification at infinity”, i.e. we can dispense with 
the requirement that there are subsets of Z where there is no selection. We also 
consider a model with multiple discrete choices (schooling levels) instead of just 
two. With these counterfactual distributions determined, we can identify the 
impact of social policy on the distributions of outcomes and returns. 



10 In our companion paper, we show how indicators of / can be used to supplement, or 
replace, panel data. This type of identification is familiar to users of LISREL (see Joreskog 
and Sorbom, 1979). 
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Figure 1 

Distributions of Wages, High School Graduates 

White Males, age 29 from NLSY 
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Figure 2 

Distributions of Wages, College 

White Males, age 29 from NLSY 
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Figure 3 

Distributions of Returns to High School 

White Males, age 29 from NLSY 
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Figure 4 

Distribution of Returns to College vs. High School 
White Males, Age 29 from NLSY 
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Figure 5 

Marginal Treatment Effect 
High School - College 
NLSY, White Males 
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Figure 6 

People affected by full subsidy to community college tuition by decile of initial overall wage distribution 












Figure 7 

People affected by making distance to 4y college = 0 by decile of initial overall wage distribution 
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Table 1 

AVERAGE LOGWAGES AND RETURNS FOR AVERAGE AND MARGINAL PERSON 
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MOBILITY OF PEOPLE AFFECTED BY FULL TUITION SUBSIDY 
Fraction of total population affected by policy: 0.038 
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Table 3 

MOBILITY OF PEOPLE AFFECTED BY DISTANCE TO 4 YEAR COLLEGE 
Fraction of total population affected by policy: 0.013 
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